-
Notifications
You must be signed in to change notification settings - Fork 598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(v2): metastore index #3586
base: main
Are you sure you want to change the base?
Conversation
f53bf79
to
af472ea
Compare
5755b3e
to
c5f46ce
Compare
Config *Config | ||
|
||
partitionMu sync.Mutex | ||
loadedPartitions *lru.Cache[PartitionKey, *indexPartition] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the cache should be per partition+tenant to avoid interference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you mean using partition+tenant as the key, that will also have interference (busy tenants would push others out of the cache). For separation we could do separate caches per tenant (complex, costly) or go back to the previous solution (an unbounded TTL-based cache, uneven memory usage).
As discussed separately, this cache is also not suitable because we can get the active (for writes) partition unloaded with a large query. This again can be solved by switching to a different caching strategy (ARC, 2Q, etc.).
Personally, given our usage patterns (frequent writes, infrequent reads) I lean towards a custom solution similar to what we had before, with an upper bound on how many items stay in memory and explicit checks that prevent the write partition to be unloaded.
s.tenants[b.TenantId] = ten | ||
} | ||
|
||
ten.blocks[b.Id] = b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably should reject insert if the block is already present. Like, First Write Wins – block meta is immutable
for _, t := range i.store.ListTenants(meta.Key, s) { | ||
te := &indexTenant{ | ||
blocks: make(map[string]*metastorev1.BlockMeta), | ||
} | ||
for _, b := range i.store.ListBlocks(meta.Key, s, t) { | ||
te.blocks[b.Id] = b | ||
} | ||
sh.tenants[t] = te | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we should avoid loading all tenants for a partition (I just can't find a use case)
func (i *Index) tryDelete(key PartitionKey, shard uint32, tenant string, blockId string) (*metastorev1.BlockMeta, *PartitionMeta, bool) { | ||
meta := i.findPartitionMeta(key) | ||
if meta == nil { | ||
return nil, nil, false | ||
} | ||
|
||
p := i.getPartition(meta) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we shouldn't load partition from disk here:
- If partition (its part you want to modify) is in memory – update it.
- Delete the block from store. (and make sure we created tombstone for the block – out of scope for the PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, we still need to find partition for the block in order to delete it but there is no need to load the partition.
Introduces the
metastore/index
package which adds a time-partitioned block index and integrates it with the existing flows for adding, compacting and querying blocks. This is a draft with a few parts missing (some of which are marked with FIXME and TODO comments).This is a breaking change meant for the v2 work stream. Once merged existing blocks will not be reachable.